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Abstract. Most existing approaches for community detection require complete information of the graph 
in a specific scale, which is impractical for many social networks. We propose a novel algorithm that 
does not embrace the universal approach but instead of trying to focus on local social ties and modeling 
multi-scales of social interactions occurring in those networks. Our method for the first time optimizes the 
topological entropy of a network and uncovers communities through a novel dynamic system converging to 
a local minimum by simply updating the membership vector with very low computational complexity. It 
naturally supports overlapping communities through associating each node with a membership vector which 
describes node’s involvement in each community. This way, in addition to uncover overlapping communities, 
we can also describe different multi-scale partitions by tuning the characteristic size of modules from the 
optimal partition. Because of the high efficiency and accuracy of the algorithm, it is feasible to be used for 
the accurate detection of community structure in real networks. 

1 Introduction 
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Since the publication of the seminal works of Barabasi 
and Albert [T] , a lot of real complex systems have been ex- 
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amined from the viewpoint of complex networks. Having 
been observed to arise naturally in a vast range of physical 
phenomena, complex networks can describe complex sys¬ 
tems containing massive units (or subsystems) with nodes 
representing the component units and edges standing for 
the interactions among them. The social network is a rep¬ 
resentative complex network and closely related to our life, 
for example, World Wide Web [5], traffic networks [3], sex¬ 
ual networks [1], and article cite networks [5]. 

The study on the community structure of social net¬ 
works has become a very important issue in the field of 
complex networks. Nodes, which belong to a tight-knit 
community, are more likely to have particular properties 
in common. It is significantly important to identify com¬ 
munities in social networks. By taking WWW network as 
an example, groups of web pages are more likely linking 
to web pages on related topics. These sets of web pages 
might correspond to some kinds of communities. Based on 
this search engines may increase the precision and recall of 
search results by focusing on narrow but topically-related 
subsets of the web. 

The problem of finding communities in social complex 
networks has been studied for decades. Recently, several 
quality functions for community structure have been pro¬ 
posed to solve this problem 0 Cl 0. Among them, mod¬ 
ularity Q is proved to be the most popular 0 0 [13 
and has been pursued by many researchers nn] HI] US]. 
However, most of those approaches require knowledge of 
the entire graph structure to identify global communities 
based on global information. That means, one needs to 


access the whole network information. This constraint is 
impractical for large complex networks, because it is a 
challenge to know the whole network completely. More¬ 
over, statistical methods can only detect the most signif¬ 
icant connectivity community patterns and ignore their 
multi-scale topology. These identifications don’t have the 
advantage of providing a coarse-grained representation in 
the system, thereby they can’t sketch its organization or 
identify the sets of nodes which are likely to have hidden 
functions or properties in common. 

Because of these limitations, we present a novel algo¬ 
rithm for community detection focused on social networks 
in this paper. The algorithm does not embrace the univer¬ 
sal approach instead of trying to focus on social networks 
using local information and modeling the multi-scale so¬ 
cial interaction patterns occurring in those networks. Our 
method optimizes the topological entropy that represents 
the statistic significance of a network for the first time. 
Although the topological entropy function is not convex 
and it is unrealistic to expect a standard optimization al¬ 
gorithm to find the global minimum, we develop a novel 
dynamic system which converges to a local minimum by 
simply updating the membership vector with low compu¬ 
tational complexity. We don’t need to specify the number 
of communities that need to partition. It naturally sup¬ 
ports overlapping communities by associating each node 
with a membership vector describing node’s involvement 
in each community. Theoretical analysis and experiments 
show that the algorithm can uncover communities fast and 
accurately. 
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The outline of this paper is as follows. Section 2 intro¬ 
duces the problem of community detection in social net¬ 
works and the motivation behind our algorithm. In section 
3, we present our algorithm through four steps and explain 
each one respectively. In section 4, we analyze some impor¬ 
tant properties of our algorithm and in section 5, we run 
this algorithm in several real social networks which repre¬ 
sent a good fraction of the wide body of social networks. 
Section 6 concludes this paper. 

2 Motivation 

Given a network G = {V, E) contains n nodes, suppose 
we can divide them into a groups. For each group suppose 
that we can select a “leader”. The leaders should be able 
to have two properties: they should be well connected to 
the members of their group, and they should also be able 
to communicate with other leaders when necessary. If the 
distributed algorithm is carried out in each group sepa¬ 
rately and the leaders communicate on a higher level, the 
agents can enjoy faster convergence rate. 

It is natural to relate social networks with hierarchi¬ 
cal structure. In one such hierarchy there are leader nodes 
that are more important than some other nodes, hence 
located on a higher level in the hierarchy. It naturally fol¬ 
lows that the leader is located on the highest level within 
that hierarchy. By taking the DNS network in WWW [5] 
as an example, the Router server is a natural leader and 
locates on the highest hierarchy(see Fig.I) when searching 
IP address. Since the hierarchies are consequence of the 
spreading of its correlation, and so are the communities, 


Route Server 



www.abc.co.com 

Fig. 1. Hierarchical structure of DNS network with 
IP “www.abc.co.com”. The most influential node, router 
server, is located on a highest level in the hierarchical tree. 
The servers that include “www”, “abc”, “co” and “com” 
are located at lower level. To obtain the IP address, users 
need to make a inquiry from the highest level router server 
to lowest WWW server. Node size depicts different levels in 
the hierarchy with the bigger node locating at the higher 
level. 

we believe that the identification of these hierarchies in a 
network will result in a natural community detection. The 
area on which a leader has most influence should define its 
community. So, community detection is performed by find¬ 
ing all natural leaders and all nodes on which they have 
influence. Partitions obtained in this way can be naturally 
explained. Also, another intuitive property that a commu¬ 
nity should possess is satisfied this way, that is shortest 
paths exist between nodes from a same community. 

Given a graph, individual nodes only have local knowl¬ 
edge about its structure, which include information about 
their neighboring nodes. If any node wants to improve 
its own performance, it needs to know more about the 
global picture of the network. This information can be 
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used by the node to refine its choice of neighbors in order 
to improve its performance. However, this will cost a lot of 
computational complexity. The most complete measure of 
global graph structure is the adjacency matrix. Since each 
node has limited memory, energy, and computational ca¬ 
pacity, it will be difficult to directly use the adjacency ma¬ 
trix. Our goal is to devise a scheme to provide each node 
with a small vector that includes compact global informa¬ 
tion on how the node is located with respect to the other 
nodes. It is desired that the scheme can be disseminated 
via an implementable distributed manner. 

Moreover, a powerful method uncovering the modules 
in social networks should use a multi-scale way m cs- 
This identification has the advantage of providing a coarse¬ 
grained representation of the system, thereby allowing to 
sketch its organization and to identify sets of nodes that 
are likely to have hidden functions or properties in com¬ 
mon. Most community detection methods find a partition 
of the nodes into communities, where most of the links 
are concentrated within the communities. Each node is as¬ 
signed to one and only one community, i.e., partitions are 
not compatible with overlapping communities |17) |18j . At 
the heart of most partitioning methods, there is a mathe¬ 
matical definition for what is thought to be a good parti¬ 
tion. Once this quality function has been defined, different 
types of heuristics can be used in order to find, approx- 
imatively, its optimal partition, i.e., to find the partition 
having the highest value of the quality function. 


3 The algorithm 

For a network G = {V,E) with n nodes, we develop 
a distributed algorithm, which can categorize the node 
as “leader” or “regular” using local information. Further, 
the method assigns each regular agent with a membership 
vector in multi-scale way, indicates that leaders has more 
influence on it. This provides the nodes with some global 
picture of the network. The iteration includes three steps 
described as follows. 

3.1 Leadership of nodes 

First, we calculate the leadership fi of every node i 
in the network. The leadership fi represents how impor¬ 
tant is the opinion of node i in the network. Let the node 
leadership function defined as: 

n 

m= E (1) 

where dij is the shortest distance from vertex i to ver¬ 
tex j. 6 S (0,-|-oo) is the influence factor which is used 
to control mutual action range between nodes. Accord- 
ing to the properties of exponential function e , for a 
special value of S, the influence range of every node to 
other nodes is approximately • When dij larger than 
, the value of exponential function rapidly reduce to 
0, so we can use S to control the influence range of a node 
and calculate f{i) only within the range dij < For 

the dense region of a network, nodes have higher leader¬ 
ship. The nodes with largest leadership mean they have 
most amount of links with other nodes and can be viewed 


Hui-Jia Li et al.: Identifying overlapping commnnities in social networks using multi-scale local information expansion 5 


as candidate of leader nodes. Therefore, we can use node 
leadership to represent the importance of a node in the 
network. 

3.2 Identifying the leader nodes 

Identifying the leader nodes of the community is very 
important to analyze the properties of the complex net¬ 
works. Many ways can be used to define the “key node”, 
such as the nodes with largest degree or betweenness cen¬ 
trality. Here, we use node leadership to search leader nodes. 
According to the notion of community structure, the den¬ 
sity of inner-community links is larger than the rest of 
nodes. Each community represents a local region with rel¬ 
ative higher correlation and the leader node of the com¬ 
munity has the highest leadership and is tightly linked by 
other nodes. Moreover, different communities are divided 
by local lowest leadership nodes - the boundary nodes. 

Note that in the rare cases where two or more leaders 
are also most influential neighbors between each other, 
then they are grouping together and are becoming leaders 
of one group. For example, in a full connected network, all 
of the nodes are leaders of one community, whereas for a 
ring network, each node is a leader to its own community. 
Specifically, if the length of two highest leadership nodes 
less than group them together and consider they 

are in one group. Finding leader nodes only needs a simple 
breadth first search and if find, we choose a random node 
to restart this process until converge. The computational 
complexity is 0(m), where m is the number of edges in 
the network. 


3.3 Determining the membership using random walk 

At this step, our goal is to devise a scheme to provide 
each node with a small vector that includes compact global 
information on how the node is located with respect to the 
other nodes. We provide a definition for the membership 
vector based on the properties of random walk dynamic on 
graphs. Consider a graph with a leaders hjh, la and n— 
a regular nodes. Given the leaders and the arbitrary order 
assigned to them, we describe the algorithm to determine 
the membership vectors for each regular node. We denote 
the membership vector of node f by = (xj, x'j, ■ ■ ■, xf) € 
i?“. By x^(t), we mean the fc-th entry of the influence 
vector of node i evaluated at time t. 

The procedure operates as follows. The membership 
vector of leader li is first assigned to be the unit vector. 
These a vectors do not vary. For regular node i, x^ is 
initialized randomly, distributed uniformly on [0,1](fc = 
1,2,...,a). Then we normalize each row of so that for 
all leader k, the sum of x^ is 1. At each iteration time 
t, the influence vector of each regular node i is updated 
entry-wise (fc = 1, 2,..., a) using the following rule: 

xHt + ^) = ^ ^ [xHt) + (01 ( 2 ) 

Lj aij -f 1 j 

where A = {aij} is the adjacency matrix in which = 1 
if node i and j are connected and = 0 otherwise. 

We notice that, for all time t, = 1- Equa¬ 

tion (2) is equivalents to A(t -|- 1) = PX{t) = [I -\- 
D)~^{A -\- where P = {I + D)~^{A + D) is a 

stochastic walk matrix. Actually, the influence of leader 
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nodes lk{k = 1,2, ...,a) on any regular node i, x^, is the 
probability that a random walker that starts from i hits Ik 
before it hits any other leader node m- If the underlying 
graph is connected, the iteration limi_>oo Xi (t) converges 
to a set of unique vectors and these vectors can naturally 
be represented as the probability a regular node belongs 
to the community that a given leader node in. As a result, 
although leadership of a node only contain local informa¬ 
tion, we can use random walk dynamic to gain member¬ 
ship containing a global view of the whole graph. 

4 Some descriptions of the algorithm 

In this section we describe several important proper¬ 
ties of the algorithm, including computing the influence 
factor S to recognize multi-scale communities, identify¬ 
ing the leaders using local information, determining the 
overlapping nodes and estimating the complexity of the 
algorithm. 


a a 


dK&^a — 0 — 13 —a 



^ ^ Minimal entropy H=1.748 with S - 1.016 

0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 

6 

(b) 

Fig. 2. (a). A simple network with eleven nodes, (b). Plot 
of topological entropy H versus influence factor <5. 


4.1 Determining the influence factor S to recognize 
multi-scale communities 

According to the definition of leadership, the algorithm 
is controlled by only one parameter, the influence factor 
6. We can naturally use 6 to control the scale of commu¬ 
nity structure detected by our method. Here, we introduce 
topological entropy H [73] [7S] that represents the statistic 
significance of a network to choose suitable S: for network 
G = {V,E), V = ui, t> 2 ,..., the leadership of V are 
/(I), /(2),..., /(n), the topological entropy is defined as: 


^ = -E 




^Er=i/(3 "Tr=i/(3^ 


(3) 


Small H means a stable and suitable partition. For 
a simple example, we consider network contain 11 nodes 


shown in Fig 2(a) and calculate topological entropy cor¬ 


responding to different S. As shown in Fig |2(b)[ when S 
increases from 0, the corresponding entropy begins to de¬ 
crease and reach minimal 2.2805 at s specific value of S 
(S = 1.26). When S leaves from optimal value, entropy 
begin to increase with 6 and finally reach the maximal 
value. 
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Therefore, to find a optimal S, it is equivalent to min¬ 
imize the single parameter nonlinear function H(S) and 
many algorithms can be used, for example, random search 
algorithm and simulated annealing algorithm. However, <5 
corresponding to a small value of H but not minimal is also 
meaningful. Specially, according to the property of leader¬ 
ship, the influence range of a node is approximately • 
When 0 < <5 < V2f3, there is no interaction between two 
nodes. Because no interaction exists, every node belongs 
to community contains itself and the number of commu¬ 
nity is n. Similarly, when v^/S < S < 2\/2l3^ a node only 
interacts with its neighborhood. As the value of 5 grows, 
nodes can influence more and more nodes and thus the 
number of leaders and communities decreases. Finally, as 
5 > \/13I3, D is the diameter of network, every pair of 
nodes can influence each other no matter how far they 
are. 


To show our method can discover multi-scale commu¬ 
nity structure with the variation of 5, we have tested the 
multi-scale modular structure in a classical hierarchical 
scale-free network with 125 nodes, RB125, proposed by 


Ravasz and Barabasi m- In Fig |3(a)| we plot the modu¬ 
lar structure found with minimal entropy H = 3.107 and 
another small value H = 3.352, which shows two differ¬ 
ent scales that deserve discussion. The value of H versus 
different S is plotted in Fig |3(b)[ We observe clearly persis¬ 
tent structures in 25 and 5 communities respectively, that 
accounts for the subdivisions more significant in the pro¬ 
cess, showing two hierarchical levels for the structure. The 



(a) 



(b) 

Fig. 3. RB 125 corresponds to the hierarchical scale-free 
network, (a) corresponding to 25, 5 modules are the most 
reasonable partition in terms of resolution with H = 3.107 
and 3.352. (b) plots the number of communities versus the 
value of 6. 

partition in 25 modules and the partition in 5 modules are 
highlighted on the original network. 

4.2 Determining the leaders using local information 

In the algorithm, leadership of one node can be deter¬ 
mined through only local information. By detecting the 
leader in a community we gain very useful information of 
the most influential node in its community. By removing 
the leader it can be expected for the community to suf¬ 
fer serious consequences, like splitting into several smaller 
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communities. The leader’s hierarchy, or the leader’s com¬ 
munity, is the area where the leader’s opinion is the most 
influential opinion. For example, this can be used for an 
immunization for epidemic spreading. Thus, the algorithm 
can naturally determine the number of leaders, that is 
also, the number of communities. One interesting feature 
of the algorithm is that although it automatically detects 
the best leaders, one can manually specify particular nodes 
as leaders and build community structures around them. 

4.3 Determining the overlapping nodes 

It is worthwhile to point out that the vast majority 
of community detection methods assume that communi¬ 
ties of complex networks are disjoint, placing each node in 
only one non-overlapping cluster. Generally, we call these 
methods ”hard-partition” algorithms. However, in many 
real networks communities often overlap to some extent. 
An important property of our algorithm is the compu¬ 
tation of a membership vector for each node. Instead of 
having one number denoting its membership in a single 
community, we have a percentage for each community. As 
a result, we can easily identify nodes that naturally be¬ 
long to more than one community known as overlapping 
nodes EH] EZ] [IS- So our method is a “soft-partition” 
algorithm. Additionally, we can find nodes that are good 
followers of their leader, and also nodes that have no dis¬ 
tinguished leader and serve as a proxy between several 
communities. 


4.4 Computational complexity 

The overall complexity of the algorithm depends on the 
highest complexity of the three parts of the algorithm. In 
the following we analyze each of them sequentially. 

The first step is calculating node’s leadership f{i). We 
need to calculate the exponential function within length of 
shortest path dij < between pair of nodes and the 

complexity of this procedure is at least 0{m), in is the 
number of links. Actually, the computation complexity is 
worst, 0{n'^), for a dense graph. Next step, determining 
the leader nodes of communities, is proceeded by searching 
all local highest leadership nodes. This can be done by a 
simple breadth first search and the complexity is 0(m). 
The last operation is very similar to the consensus linear 
process, whose complexity is 0(n) similar to the random 
walk process. 

To conclude this section, the one with the highest com¬ 
putational complexity is the first step, i.e., calculating the 
leadership of nodes. Its complexity depends on the de¬ 
gree of connectivity and the graph which is very densely 
connected needs more complexity. This accounts for the 
overall complexity of the algorithm is 0{m) at best and 
0{n^) at worst. 

5 Experiments 

In this section, we respectively apply the algorithm 
to simulated benchmark networks (LFR networks) [5^ 
and some real social networks: the karate club network of 
Zachary m. the scientific collaboration network m and 
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finally a large scale semantic network m- Results show 
that the algorithm can discover multi-scale communities 
efficiently and accurately. 

5.1 The benchmark network 

We empirically demonstrate the effectiveness of the al¬ 
gorithm through comparison with other five well-known 
algorithms on the artificial benchmark networks. These 
algorithms include: Newman’s fast algorithm [ 5 ], Danon 
et al’s method [ 5 ], the Louvain method [ 35 ], Infomap [ 35 ] 
and the clique percolation method m- We utilize the LFR 
benchmark proposed by Lancichinetti et al in |d()| . This 
benchmark provides networks with scale-free distributions 
of node degree and community size and thus poses a much 
more severe test to community detection algorithms than 
standard benchmarks. Many parameters are used to con¬ 
trol the generated networks in this benchmark: the num¬ 
ber of nodes N, the average node degree (k), the maximum 
node degree maxk, the mixing ratio p. (each vertex shares 
a fraction /x of its edges with vertices in other communi¬ 
ties), the minimum community size mine and the maxi¬ 
mum community size maxc- The value of fj, varies within 
[0, IJ and determines the level of the fuzziness of the com¬ 
munities in the network. The larger the the more fuzzy 
the communities. In our test, we use the default parame¬ 
ter configuration where N = 1000, (k) = 15, maxk = 50, 
minc=‘2,0 and maxc=50- 

To evaluate a community detection algorithm, we use 
the normalized mutual information (NMI) measure [31] to 
estimate the partition found by each algorithm. The test 



Fig. 4. The comparison of NMI with six algorithms. 

focuses on whether the intrinsic scale can be correctly un¬ 
covered. The experimental results are displayed in Fig|4] 
where y-axis represents the value of NMI calculated by 
the algorithms mentioned above, and each point in curves 
is obtained by averaging the values obtained on 50 syn¬ 
thetic networks sampled from above model. As we can see, 
all algorithms work very well when 1 — /x is more than 0.7 
with NMI larger than 0.85. Compared with other five algo¬ 
rithms, our algorithm performs quite well and its accuracy 
is only slightly worse than that of the clique percolation in 
the case of 0.5 < 1 —/x < 0.65. However, clique percolation 
is nearly same as the Breath First Sea.Tch{BFS) and very 
time consuming. The complexity of clique percolation is 
almost 0{n^) and much larger than our method. 

As real networks may have some different topological 
properties from synthetic ones, in the following we con¬ 
sider several widely used real-world networks to further 
evaluate the performance of our method. 
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5.2 The karate club network of Zachary 

Over the course of two years in the early 1970s, Wayne 
Zachary observed social interactions between the members 
of a karate club at an American university m- He con¬ 
structed networks of ties between members of the club 
based on their social interactions both within the club 
and away from it. By chance, a dispute arose during the 
course of his study between the club’s administrator and 
its principal karate teacher over whether to raise club fees, 
and as a result the club eventually split into two, forming 
two smaller clubs, centered around the administrator and 
the teacher. 

We minimize the function of H{S) and get the opti¬ 
mal value of 5 = 1.85 and H = 3.914. As it is shown on 
Fig |5(a)[ the partition found by our algorithm not only 
matches the original partition, but also identifies the ex¬ 
act leaders. Nodes 1 and 33 own the local highest leader¬ 
ship, which respectively represent the administrator and 
the teacher. In this instance node 3 is detected as an over¬ 
lapping node because its membership belonging to two 
communities is nearly equal. Actually, node 3 is on the 
border between the communities and so it is understand¬ 
able that it might be an ambiguous case. 

Compared with the optimal situation, when decreasing 
5 to 1.41, the entropy H = 4.139 is also very small. The 
community structure detected in this situation is shown 
in Fig |5(b)[ which reveals another scale of relationships 
among the members of the karate club. Node 28 becomes 
another local highest leadership node and four most un¬ 
stable nodes including nodes 3, 10, 20, 28 are marked in 



(c) 

Fig. 5. The community structure of the karate club net¬ 
work detected when 5 (a) equals to the optimal value 1.85, 
(b) decreases to 1.41 from optimal, and (c) further de¬ 
creases to 0.933. In subgraphs (a), (b) and (c), commu¬ 
nities are represented by different shapes and overlapping 
nodes are enclosed in dashed curves. 
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a dashed curve. Such members have good friendship with 
more than one clubs at the same time, so they are over¬ 
lapping nodes in this situation. And now the number of 
communities detected in the karate network is three. Fur¬ 
thermore, as decreasing 6 to 0.93 H becomes 4.436, we get 
a partition with 4 communities shown in Fig |5(c)[ This 
partition is identical to |12] described by Newman. Six 
overlapping nodes are detected which constitute the fuzzy 
boundaries of the communities. Thus, partitions using dif¬ 
ferent scales of S are able to reflect multi-scale property of 
the real networks. 


5.3 The scientific collaboration network 


The scientific collaboration network was collected by 
Girvan and Newman m and has been examined in Refs 
[T5] [52] . This network consists of 118 nodes (scientists or 
authors), and edges between them indicate co-authorship 
of one or more papers appearing in the archive. The col¬ 
laborative ties represented in the figure are not limited 
to papers on topics concerning networks - we were in¬ 
terested primarily in whether people know one another, 
and collaboration on any topic is a reasonable indicator 
of acquaintance. 

The present method detects eight communities with 


optimal (5 = 1.493 and minimal entropy H = 5.447. Fig 6(a) 
shows the community structure detected at optimal situ¬ 
ation which is exactly same as Refs. [22] [TSj. This con¬ 
firms our partition as a good one. However, we believe 
our method can also make a meaningful “coarse-grained” 
partition which is visually reasonable. So the value of S 


is amplified to 1.749 from optimal and the corresponding 
entropy H = 6.483. Owing to the the amplification of the 
influence range of nodes with 5, the number of communi¬ 
ties decreases. From Fig |6(b)] we notice some “uninfluen- 
tial” communities, like the light blue and yellow ones, are 
merged by the more powerful red and dark green commu¬ 
nities, respectively. Finally we get six communities which 
can be interpreted readily by the human eye. These multi¬ 
scale partitions will be invaluable in helping us to un¬ 
derstand the large-scale structure of these network data. 
Furthermore, overlapping nodes enclosed in dashed curves 
in Figini are detected according to their membership vec¬ 
tors. These nodes generally locate on the borders of two 
or more communities and represent authors with multiple 
research interests or cross-discipline background. Maybe 
such nodes play a role in bridging two or more communi¬ 
ties in a complex network of other types. The ability to 
find overlapping nodes is a distinguished feature of our 
method and useful to reveal a natural characteristic in 
many social networks. 

5.4 A large scale semantic network 

The semantic network from Ref. m contains 7207 
phrases and 31784 edges. The weights of edges are calcu¬ 
lated in terms of phrase co-occurrences. For visualization 
purpose, our algorithm outputs a transformed adjacency 
matrix (in which the vertices within the same communi¬ 
ties have been arranged together) with a hierarchical com¬ 
munity structure. The distribution of community sizes is 
shown in FiglTj Totally, 569 communities are detected by 







12 Hui-Jia Li et al.: Identifying overlapping communities in social networks using multi-scale local information expansion 



(b) 

Fig. 6. The community structure of the scientihc collabo¬ 
ration network obtained when 6 (a) equals to the optimal 
value 1.493, (b) is amplified to 1.749 from optimal. In both 
subgraphs (a) and (b), overlapping nodes are enclosed in 
dashed curves. 


setting optimal S = 2.931 and minimal entropy H = 5.952. 
The maximum size of community is 139, the minimum size 
is 2, and the average size is 12.57. One can see an approx¬ 
imate power-law phenomenon, that is, most communities 
are small and only a few are big. Among them, we have 
selected four interesting communities listed as follows: 

Community 1 = {Scientist, Inventor, Genius, Gifted, 
Brilliant, Intelligent, Smart, Science, Intelligence, Musi¬ 
cian} ; 



Fig. 7. The distribution of community size in a linear 
plot. 


Community 2 = (Violin, Instrument, Cello, Band, 

Tuba, Clarinet, Orchestra, Trumpet, Trombone, Oboe, 
Woodwind, Symphony, Flute, Bass, Viola, Fiddle}; 

Community 3 = (Ovation, Sitting, Low, Descent, 

Up, Step, Ascend, Elevator, Ascent, Staircase, Stairwell, 

Climb, Steps, Ladder, Stairs, Wake, Stairway, Rise, Esca¬ 
lator, Stair, Down, Standing, Resting, Using}; 

Community 4 = (Nails, Hammer, Carpenter, Screw, 
Screwdriver, Tool, Pliers, Wrench, Sickle, Mechanic, Phillips}. 

These four communities are all reasonable modules 
listed in Ref. m and the elements of each are all have 
same meaning. Among these elements, {Musician, Intelligence} 
are uncovered as overlapping nodes between communities 
1 and 2, and {Using, Tool, Mechanic} are the overlapping 
nodes between communities 3 and 4. We can easily recog¬ 
nize that these overlapping phrases have fuzzy meanings 
and have high value of phrase co-occurrences. 
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As the inherent community structure for this large se¬ 
mantic network is usually unknown, it is worth to make 
use of a measure to quantitatively evaluate the perfor¬ 
mance of our method. Here the popular modularity Q [7] 
[5] is adopted as a reference, which was proposed by New¬ 
man and Girvan and has been heavily used for community 
detection in recent years. Q is defined as: 


Q = 


El 


L ^2L’ 


(4) 


i=l 

Here, c is the number of communities, L is the total num¬ 
ber of edges in the network, and Z™ and di = 2Z-" -|- 
are the number of edges and the sum of vertex degrees 
in the ith community, respectively. FiglH] shows the result 
that compares modularity Q with the topological entropy 
H across multi-scale of 6. As we can see, the main trend 
is that the lower value of H, the larger value of Q. When 
S reaches the optimal value H = 5.952, the Modularity 
Q also reaches the maximal Q = 0.521 exactly. The re¬ 
sult shows the community structure of the network corre¬ 
sponding to a certain S is strong and robust. In conclusion, 
our algorithm can uncover the most suitable community 
scale effectively on real-world networks. 


6 Conclusion 

In summary, we have presented a novel community de¬ 
tection method based on local information in social net¬ 
works. The algorithm does not embrace the universal ap¬ 
proach but tries to focus on local social ties and model 
multi-scales of social interactions that occur on those net¬ 



_■_I_■_l_ 

2.7 3.0 3.3 

6 


Fig. 8. The comparison of Modularity Q with topological 
entropy H across multi-scale 6. 

works. It identifies leaders and then detects communities 
located around the leaders using random walk dynamic. 
Our method not only supports overlapping communities 
detection using a membership vector to denote node’s in¬ 
volvement in each community, but can also describe differ¬ 
ent multi-resolution clusters allowing to discover “coarse¬ 
grained” modules versus the optimal partition. Applying 
our algorithm to several typical real-world networks with 
well defined community structures, we obtained reason¬ 
able results. So this method is feasible to be used for the 
accurate detection of community structures in complex 
networks. To sum up, from a new perspective, we propose 
a new community detection algorithm based on local in¬ 
formation in this paper. The computational results on real 
social networks show that the new method not only can 
detect the accurate communities but also can extract the 
hierarchical structures of the networks. 
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